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Abstract — In this paper we construct a deterministic polyno- 
mial time algorithm for the problem where a set of users is 
interested in gaining access to a common file, but where each 
has only partial knowledge of the file. We further assume the 
existence of another set of terminals in the system, called helpers, 
who are not interested in the common file, but who are willing 
to help the users. Given that the collective information of all 
the terminals is sufficient to allow recovery of the entire file, 
the goal is to minimize the (weighted) sum of bits that these 
terminals need to exchange over a noiseless public channel in 
order achieve this goal. Based on established connections to 
the multi-terminal secrecy problem, our algorithm also implies 
a polynomial-time method for constructing the largest shared 
secret key in the presence of an eavesdropper. We consider the 
following side-information settings: (i) side-information in the 
form of uncoded packets of the file, where the terminals' side- 
information consists of subsets of the file; (ii) side-information 
in the form of linearly correlated packets, where the terminals 
have access to linear combinations of the file packets; and (iii) the 
general setting where the the terminals' side-information has an 
arbitrary (i.i.d.) correlation structure. We provide a polynomial- 
time algorithm (in the number of terminals) that finds the optimal 
rate allocations for these terminals, and then determines an 
explicit optimal transmission scheme for cases (i) and (ii). 

I. Introduction 

In recent years cellular systems have witnessed significant 
improvements in terms of data rates, and are nearly approach- 
ing the theoretical limits in terms of the physical layer spectral 
efficiency. At the same time, the rapid growth in the popu- 
larity of data-enabled mobile devices, such as smart phones 
and tablets, and the resulting explosion in demand for more 
throughput are challenging our abilities even with the cuiTent 
highly efficient cellular systems. One of the major bottlenecks 
in scaling the throughput with the increasing number of mobile 
devices is the "last mile" wireless link between the base station 
and the mobile devices - a resource that is shared among 
many terminals served within the cell. This motivates the study 
of paradigms where cell phone devices can cooperate among 
themselves to get the desired data in a peer-to-peer fashion 
without solely relying on the base station. 

An example of such a setting is shown in Figure [T] where 
a base station wants to deliver the same file to multiple 
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Fig. 1 . An example of the data exchange problem with helpers. A base station 



has a file formed of four packets wi 



, 104 G F„n and wants to deliver it to 



two users over an unreliable wireless channel. Additionally, there is a terminal 
in the system that is in the range of the base station, but he is not interested 
in the file. However, he is willing to help the two users to obtain the file. 
The base station stops transmitting once all terminals collectively have all the 
packets, even if individually they have only subsets of the packets. They can 
then cooperate among themselves to recover the users' missing packets. If the 
goal is to minimize the total number of communicated bits, helper transmits 
packet Wl + W3, while user 2 transmits packet U14, where the addition is in 



geographically-close users over an unreliable wireless down- 
link. We assume that some terminals, which are in the range 
of the base station, are not interested in the file, but due to 
their proximity to the base station, they are able to overhear 
some of its transmissions. Moreover, we assume that these 
terminals are willing to help in distributing the file to the 
respective users. We will refer to these terminals as helpers. In 
the example of Figure[T]we assume that the file consists of four 
equally sized packets wi, W2, and W4 belonging to some 
finite field F^'. Suppose that after a few initial transmission 
attempts by the base station, the three terminals (including one 
helper) individually receive only parts of the file (see Figure[T]|, 
but collectively have the entire file. Now, if all terminals are 
in close vicinity and can communicate with each other, then, 
it is much more desirable and efficient, in terms of resource 
usage, to reconcile the file among users by letting all terminals 
"talk" to each other without involving the base station. The 
cooperation among the terminals has the following advantages: 

• Local communication among terminals has a smaller 
footprint in terms of interference, thus allowing one 
to use the shared resources (code, time or frequency) 
freely without penalizing the base station's resources, i.e., 
higher resource reuse factor. 

• Transmissions within the close group of terminals is 
much more reliable than from the base station to any 



terminal due to geographical proximity of terminals. 
• This cooperation allows for the file recovery even when 

the connection to the base station is either unavailable 

after the initial phase of transmission, or it is too weak 

to meet the delay requirement. 
The problem of reconciling a file among multiple wireless 
users having parts of it while minimizing the cost in terms of 
the total number of bits exchanged is known in the literature 
as the data exchange problem and was introduced by El 
Rouayheb et al. in |[T]. In the problem formulation of the data 
exchange problem it is assumed that all the terminals in the 
system are interested in recovering the entire file, i.e., there 
are no helpers. For data exchange problem without helpers 
a randomized algorithm was proposed in |2| and |3|, while a 
deterministic polynomial time algorithms was proposed in |i4J, 
0. 

In this paper we consider a scenario with helpers, and linear 
communication cost. W.r.t. the example considered here, if 
user 1, user 2 and the helper transmit i?i,i?2 and bits, 
respectively, the data exchange problem with helpers would 
correspond to minimizing the weighted sum-rate aiRi + 
a2^2 + 0:3 i?3 such that, when the communication is over, 
user 1 and user 2 can recover the entire file. It can be shown 
that for the case when ai = Q!2 = as = 1, the minimum 
communication cost is 2 and can be achieved by the following 
coding scheme: user 2 transmits packet w^, and the helper 
transmits wi + W3, where the addition is over the underlying 
field Wqn. This corresponds to the optimal rate allocation 
^2 = ^3 = 1 symbol in Fgn. If there was no helper in the 
system, it would take a total of 3 transmissions to reconcile the 
file among the two users. That is user 1 has to transmit w-i and 
user 2 transmits wi and Wi. Thus, the helpers can contribute 
to lowering the total communication cost in the system. 

The discussion above considers only a simple form of side- 
information, where different terminals observe partial uncoded 
"raw" packets of the original file. Content distribution net- 
works are increasingly using coding, such as Fountain codes 
or linear network codes, to improve the system efficiency ||6|. 
In such scenarios, the side-information representing the partial 
knowledge gained by the terminals would be coded and in 
the form of linear combinations of the original file packets, 
rather than the raw packets themselves. The previous two 
cases of side-information ("raw" and coded) can be regarded 
as special cases of the more general problem where the side- 
information has arbitrary correlation among the data observed 
by the different terminals and where the goal is to minimize 
the weighted total communication cost. In fT\ Csiszar and 
Narayan posed a related security problem referred to as the 
"multi-terminal key agreement" problem. They showed that 
obtaining the file among the users in minimum number of bits 
exchanged over the public channel is sufficient to maximize 
the size of the secret key shared between the users. This 
result establishes a connection between the Multi-party key 
agreement and the Data exchange problem with helpers. |7| 
solves the key agreement problem by formulating it as a linear 
program (LP) with an exponential number of rate-constraints. 



corresponding to all possible cut-sets that need to be satisfied. 

In this paper, we make the following contributions. First, 
we provide a deterministic polynomial time algorithm for 
finding an optimal rate allocation, w.r.t. a linear weighted 
sum-rate cost needed to deliver the file to all users when all 
terminals have arbitrarily correlated side-information. For the 
data exchange problem with helpers, this algorithm computes 
the optimal rate allocation in polynomial time for the case of 
linearly coded side-information (including the "raw" packets 
case) and for the general linear cost functions (including the 
sum-rate case). Second, for the the data exchange problem 
with helpers, with raw or linearly coded side-information, we 
propose an efficient communication scheme design based on 
the algebraic network coding framework ISl, ||9l. 

II. System Model and Preliminaries 

In this paper, we consider a set up with m terminals out of 
which some subset of them is interested in gaining access to 
a file or a random process. Let Xi, X2, . . . , X^, m > 2, 
denote the components of a discrete memoryless multiple 
source (DMMS) with a given joint probability mass function. 
Each user j G 7W = {1, 2, . . . , m} observes n i.i.d. realizations 
of the corresponding random variable Xi. 

Let A — {l,2,...,fc} C M be the subset of terminals, 
called users, who are interested in gaining access to the file, 
i.e., learning the joint process = {Xi, . . . ,Xm)- The 

remaining terminals {fc + 1, . . . , m} serve as helpers, i.e., they 
are not interested in recovering the file, but they are willing 
to help users in the set A to obtain it. In l?], Csiszar and 
Narayan showed that deliver the file to all users in a setup 
with general DMMS interactive communication is not needed. 
As a result, in the sequel WLOG we can assume that the 
transmission of each user is only a function of its own initial 
observations. Let Fi = fi{X^) represent the transmission of 
the user i £ A4, where fi{-) is any desired mapping of the 
observations X". For each user in A in order to recover the 
entire file, transmissions Fi, i £ Ai, should satisfy, 

lim -HiX24\F,Xl\) = 0, yti e A, (1) 
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where Xm = (Xi, X2, . . . , X„). 

Definition 1. A rate tuple R = {Ri, R2, . . . , Rm) is an 

achievable data exchange (DE) rate tuple if there exists 
a communication scheme with transmitted messages F = 
(Fi, F2, . . . , F„i) that satisfies ([T]l, and is such that 

R, = lim -H{Fi), Vi G M. (2) 

It is easy to show using cut-set bounds that all the achievable 
DE rate tuple's necessarily belong to the following region 

n = {Il: R{S) > HiXslXs-^), VS c M, A<^S}, (3) 

where R{S) = J2ies^i- '^Iso, using a random coding 
argument, it can be shown that the rate region TZ is an 
achievable rate region [TJ. 



In this work, we aim to design a polynomial complexity 
algorithm that delivers the file to all users in A while simul- 
taneously minimizing a linear communication cost function 
where a = (ai,--- ,am),0 < < oo, is 
an TO— dimensional vector of non-negative finite weights. We 
allow Qfi's to be arbitrary non-negative constants, to account 
for the case when communication of some group of terminals 
is more expensive compared to the others, e.g., setting ai to 
be a large value compared to the other weights minimizes the 
rate allocated to the user 1. This goal can be formulated as 
the following Unear program: 

m 

min^ ckii?!, (4) 

s.t. R{S) > H{Xs\Xsa), VS CM, AgS. 

A. Finite Linear Source Model 

In general an efficient content distribution networks use 
coding such as fountain codes or linear network codes. This 
results in terminals' observations to be in the form of linear 
combinations of the original packets forming the file, rather 
than the uncoded data themselves as is the case in conventional 
'Data Exchange problem' . This linear correlation source model 
is known in literature as Finite linear source |fTO). 

Next, we briefly describe the finite linear source model. Let 
q be some power of a prime. Consider the iV-dimensional 
random vector W e whose components are independent 
and uniformly distributed over the elements of F^n . Then, in 
the linear source model, the observation of i*'* user is simply 
given by 

X, = A,W, ieM, (5) 

where G F^'^^ is an observation matrix for the user i. 
It is easy to verify that for the finite linear source model, 

fi^^rank(A,). (6) 

log q" 

Henceforth for the finite linear source model we will use the 
entropy of the observations and the rank of the observation 
matrix interchangeably. 

III. Deterministic Algorithm 

We begin this section by exploring the case when the set 
A consists of only one user. Then, by using the methodology 
of ifTTi . we extend our solution to the case when the set A 
has arbitrary number of users. 

A. Deterministic Algorithm when \A\ = 1 

Let the user ti £ be the only one user interested in a 
file, i.e., A = {t\}. This is known as a multi-terminal Slepian- 
Wolf problem fT2l for which the achievable rate region has 
the following form: 

Ui ^ {B. : R{S) > H{Xs\Xs^,Xi), V5CA^\{1}}. 



Hence, the underlying optimization problem has the following 
form 

min ciiRi, s.t. R G TZi. (7) 

jeA4\{i} 

Optimization problem (jT) can be solved analytically due to 
the fact that the set function 

f{S)=H{Xs\Xs^,Xi), V5CX\{1} (8) 

is supermodular (see llT3l for the formal definition). Therefore, 
optimization problem (|7]l is over a supermodular polyhedron 
TZi. From the combinatorial optimization theory it is known 
that Edmonds' greedy algorithm llT4l renders an analytical 
solution to this problem (see Algorithm [U. 



Algorithm 1 Edmonds' algorithm applied to our problem 
1: Set ji , j2, • • • , j-ni-i to be an ordering of {1, 2, . . . , m} \ 

{1} such that < aj^ < ■ ■ ■ < OLj^_^. 
2: for i = 1 to TO — 1 do 

3: R*. = (Xj. \Xt^ , Xj-^ , Xj^ , . . . , Xj^_J. 

4: end for 



Example 1. Consider a system with m ~ 6 terminals A4 — 
{1,2,3,4,5,6}. For convenience, we express the underlying 
data vector as W — [ a b c ] G Fj^^, where a,b,c are 
independent uniform random variables in Fg^ . Let us consider 
the case where each node has the following observations: 
Xi = {a + b}, X2 = {a + c}, X3 = {6 + c}, X4 = {a}, 
X5 = {6}, Xfi = {c}. Let us assume that user 1 is interested in 
recovering the vector W such that underlying communication 

cost is ^i- 

It immediately follows from Algorithm [T] that a solution to 
this problem is i?* = i?* = 1, and R^ ^ R*^ = Rl = 0. In 
other words, user 1 is missing 2 linear equations in order to 
be able to decode all 3 data packets. 

B. Deterministic Algorithm when \A\ > 1 

In this section we extend the results from the previous 
section to the case where the set A contains arbitrary number 
of users. Optimization problem (|4]i can be written as follows 

m 

mm.'S^ aiZi, (9) 

Z,R ^-^ 

i=l 

S.t. z, > Rf'\ yieA, yieM\ {i}, 
R('^ g vie A, 

where 

Ui = {Il: RiS) > H{Xs\Xs.,Xi), ySCM\ {I}} . 

Equivalence between the optimization problems (01 and ^ 
follows from the fact that transmissions of all terminals in Ai 
have to be such that all users in A can learn Xm . Optimization 
problem (|9]l has an exponential number constraints, which 
makes it challenging to solve in polynomial time. To obtain a 



polynomial time solution we consider the Lagrangian dual of 
problem 



max5:gW(A«), 



(10) 



1=1 

k 



s.t. \f = a,, > 0, Ml A, Mi <^ M\ {I}, 



1=1 



where 



gW(A«)=min V A ^i? ", s.t. R« £ 7^^ (11) 

ieM\{i} 

Dual variable A in the above problem is represented in matrix 
form as follows. 



A = 



(1) 



A^ 



(1) 

m 
v(2) 



A 



(fc) 



(12) 



We denote by Aj and A^'^ , the i*'' column vector and row 
vector of the matrix A, respectively. Moreover, we denote by 



R 



R 



(2) 



itj^ it- 



(fc) 



R 



(k) 



(13) 



the rate matrix whose row, here denoted by R^'\ represents 
an optimizer of the problem (fTTT i w.rt. the weight vector A^'^ 
In order to ensure consistency with the optimization prob- 
lem ([Toll observe that a|'^ = 0, and = 0, V? = 1, . . . , fc. 

For any given user / e A, the objective function (fTTT i of 
the dual problem (fTOl i can be computed analytically using 
Algorithm [T] The optimization problem ( fTOl i is a linear pro- 
gram (LP) with 0(m • fc) number of constraints, which makes 
it possible to solve it in polynomial time (w.r.t. number of 
terminals). To solve the optimization problem (fTOl l we apply 
a subgradient method, as described below. 

Starting with a feasible iterate A[0] w.r.t. the optimization 
problem ( fTOl i, every subsequent iterate A[n] can be recursively 
represented as an Euclidian projection of the vector 



Ai[n] = Ai[n - 1] 



11 • R,\n - 11 



A, 



(0 



VieM (14) 
= Q:i|, where 



onto the hyperplane |Ai > 0\Y^i^^ 

R.i[n — 1] is the i*'' column of the rate matrix R[n — 1]. The 
Euclidian projection ensures that every iterate A[n] is feasible 
w.r.t. the optimization problem ( fTOl l. It is not hard to verify 
that the following initial choice of A[0] is feasible w.r.t. the 
problem (fTOl l. 



J- if « ^ ^ 

^F^[0] = ^*^ if^e^\{/}, MieM,MleA. (15) 
if i = / 



By appropriately choosing the step size 6[n] in each it- 
eration ( fT4b . it is guaranteed that the subgradient method 
described above converges to the optimal solution of the 
problem ( fTOl i. To recover the primal optimal solution from the 
iterates A[n] we use results from [T5|, where at each iteration 
of (O, the primal iterate is constructed as follows. 



R[; 



where 



1, A*^"^ >0, for j = 1,2,, 



(16) 



(17) 



By carefully choosing the step size 9[n], \/n in ( fT4b and the 
convex combination coefficients ^J.^j^\ Vj — Vn, it 

is guaranteed that ( fTSI ) converges to the minimizer of and 
therefore to the minimizer of the original problem (|4|i. In ifTSll . 
the authors proposed several choices for {6'[n]} and {Mj"^} 
which lead to the primal recovery. Here we list some of them. 



1) 0[n] = 

(n) 



6+cn ' 



Vn, where a > 0, & > 0, c > 0, 
— 1 , . . . , n, Vn, 



2) 0[n] = n Vn, where < a < 1, 



= Vj = 1, . . . ,n, Vn. 
Now, it is only left to compute an optimal rate allocation 
w.r.t to the problem defined in Q. Let R* and Z* be the 
optimal rate vectors of the problems (|4|i and (|9|l, respectively. 
As we pointed out earlier R* = Z*, where Z* can be 
computed from the matrix R[n] for a sufficiently large n, as 
follows 



Z* = max 



,i?f^[n]|, MteM. (18) 



Pseudo code of the algorithm described in this section is shown 
below (see Algorithm |2]i. 

Algorithm 2 Optimal DE rate allocation 



Initialize A[0] according to ( fTSI l 
Set e[n] = Vn, /if^ = ^, Vj = {1, . . . , n} 
for n = 1 to n do 
for ^ = 1 to fc do 
Compute R(')[n] 
vector A^'^ [n] 
end for 

Project Ai[n] = Aj[n - 1] + 6[n - 1] • Ri[n 
the hyperplane |Aj > 0| X^/'Li ' 



using Algorithm [T] for the weight 



end for 



■^t I ' 

■}■ 



1] onto 



9: R[n]=E;.iMf Rb1 



10: R* 



n],R: 



C. Code Construction for the Linear Source Model 

In this Section we briefly address the question of the optimal 
code construction for the linear source model. For that matter, 
let us consider the following example. 



Example 2. Let us consider the same source model as in 
Example[T] where A~ {1,2, 3}, and the objective function is 
Y^^i=i Ri- Applying the algorithm described above, we obtain 



-Ki — n.2 



Ua — -Kc — He. — 



1 



(19) 




Fig. 2. Multicast network constructed from the source model and the optimal 



rate tuple iJJ = /?2 



2, Rl 



1. Each user receives 



side-information from "itself" through links {si,ri), i = 1,2,3, and from 
the other terminals through links (4^, rj), i = 1, . . . , 6, j = 1, 2, 3, i 7^ j. 

This solution suggests that in order to design a scheme that 
performs optimally, it is necessary to split all the packets into 4 
equally sized chunks. In other words, terminals' observations 
can be written as Xi = a + b = {ai + fci, 02 + 62, 03 + 
63,04 + 64}, X2 a + c = {oi + ci,a2 + C2,a3 + 03,04 + 
C4}, etc., where all a^'s, fe^'s and q's belong to Fg„/4. For 
this "extended" source model we have that the optimal rate 
allocation is = i?* = i?* = 1, i?* = i?* = i?* = 2. 

Next question we need to address is how to design trans- 
missions of each user? Starting from an optimal (integer) 
rate allocation, we first construct the corresponding multicast 
network (see Figure |2]i. In this construction, notice that there 
are several types of nodes. First, there is a super node S that 
possesses all the packets. Each user in the set A plays the 
role of a transmitter and a receiver, while the helpers act only 
as transmitters. To model this, we denote si, . . . , sg to be the 
"sending" nodes, and ri, r2 and to be the receiving nodes. 
To model the side-information at users 1, 2 and 3, we introduce 
links (si, Ti), i = 1,2, 3, of capacity 4, which are routing the 
users' observations to the corresponding receiving nodes. To 
model the broadcast nature of each transmission, we introduce 
"dummy" nodes ti, . . . ,tQ, such that the capacity of the links 
{si,ti) is the same as link capacity {ti, rj), j ^ i, and is equal 
to R*, Mi e M. 

To solve for actual transmissions of each terminal, we apply 
the algebraic network coding approach [81, with appropriately 
designed source matrix A which corresponds to the side- 
information of all terminals. Finally, the network code for 
the data exchange problem with helpers can be constructed 
in polynomial time from the algorithms provided in |j9] which 
are based on a simultaneous transfer matrix completion. 



IV. Conclusion and Extensions 

In this paper we study the data exchange problem with 
helpers. We provide a deterministic polynomial time algorithm 
for minimizing the weighted sum-rate cost of communication. 
We show that the data exchange problem with only one user 
and many helpers can be solved analytically using Edmonds' 
algorithm. Further using single user solution as a building 
block we show how one can solve the more general problem 
with arbitrary number of users. Several extensions are of 
interest. For instance, we can consider a modification of the 
original data exchange problem where only helpers are allowed 
to transmit. Starting from a single user case, it is easy to 
see that an achievable rate tuple must satisfy all the cut-set 
constraints over the helper set such that the user is always on 
the receiving side of the cut. Minimizing the weighted sum- 
rate cost over all achievable rate tuples can again be done using 
Edmonds' algorithm (see Algorithm [U. Finally, extension 
to the multiple user case corresponds to the weighted sum- 
rate minimization over all rate tuples that are simultaneously 
achievable for all users. This optimization problem can be 
solved in polynomial time using the same approach as in 
Algorithm |2] 
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